80 research outputs found
Manipulating Attributes of Natural Scenes via Hallucination
In this study, we explore building a two-stage framework for enabling users
to directly manipulate high-level attributes of a natural scene. The key to our
approach is a deep generative network which can hallucinate images of a scene
as if they were taken at a different season (e.g. during winter), weather
condition (e.g. in a cloudy day) or time of the day (e.g. at sunset). Once the
scene is hallucinated with the given attributes, the corresponding look is then
transferred to the input image while preserving the semantic details intact,
giving a photo-realistic manipulation result. As the proposed framework
hallucinates what the scene will look like, it does not require any reference
style image as commonly utilized in most of the appearance or style transfer
approaches. Moreover, it allows to simultaneously manipulate a given scene
according to a diverse set of transient attributes within a single model,
eliminating the need of training multiple networks per each translation task.
Our comprehensive set of qualitative and quantitative results demonstrate the
effectiveness of our approach against the competing methods.Comment: Accepted for publication in ACM Transactions on Graphic
Detecting Euphemisms with Literal Descriptions and Visual Imagery
This paper describes our two-stage system for the Euphemism Detection shared
task hosted by the 3rd Workshop on Figurative Language Processing in
conjunction with EMNLP 2022. Euphemisms tone down expressions about sensitive
or unpleasant issues like addiction and death. The ambiguous nature of
euphemistic words or expressions makes it challenging to detect their actual
meaning within a context. In the first stage, we seek to mitigate this
ambiguity by incorporating literal descriptions into input text prompts to our
baseline model. It turns out that this kind of direct supervision yields
remarkable performance improvement. In the second stage, we integrate visual
supervision into our system using visual imageries, two sets of images
generated by a text-to-image model by taking terms and descriptions as input.
Our experiments demonstrate that visual supervision also gives a statistically
significant performance boost. Our system achieved the second place with an F1
score of 87.2%, only about 0.9% worse than the best submission.Comment: 7 pages, 1 table, 1 figure. Accepted to the 3rd Workshop on
Figurative Language Processing at EMNLP 2022.
https://github.com/ilkerkesen/euphemis
Spherical Vision Transformer for 360-degree Video Saliency Prediction
The growing interest in omnidirectional videos (ODVs) that capture the full
field-of-view (FOV) has gained 360-degree saliency prediction importance in
computer vision. However, predicting where humans look in 360-degree scenes
presents unique challenges, including spherical distortion, high resolution,
and limited labelled data. We propose a novel vision-transformer-based model
for omnidirectional videos named SalViT360 that leverages tangent image
representations. We introduce a spherical geometry-aware spatiotemporal
self-attention mechanism that is capable of effective omnidirectional video
understanding. Furthermore, we present a consistency-based unsupervised
regularization term for projection-based 360-degree dense-prediction models to
reduce artefacts in the predictions that occur after inverse projection. Our
approach is the first to employ tangent images for omnidirectional saliency
prediction, and our experimental results on three ODV saliency datasets
demonstrate its effectiveness compared to the state-of-the-art.Comment: 12 pages, 4 figures, accepted to BMVC 202
- …